Overview

Dataset statistics

Number of variables16
Number of observations157
Missing cells51
Missing cells (%)2.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.3 KiB
Average record size in memory354.5 B

Variable types

NUM12
CAT4

Reproduction

Analysis started2020-05-12 06:38:48.939151
Analysis finished2020-05-12 06:39:39.864707
Duration50.93 seconds
Versionpandas-profiling v2.7.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Model has a high cardinality: 156 distinct values High cardinality
Latest_Launch has a high cardinality: 130 distinct values High cardinality
Price_in_thousands is highly correlated with four_year_resale_valueHigh correlation
four_year_resale_value is highly correlated with Price_in_thousandsHigh correlation
Power_perf_factor is highly correlated with HorsepowerHigh correlation
Horsepower is highly correlated with Power_perf_factorHigh correlation
four_year_resale_value has 36 (22.9%) missing values Missing
Price_in_thousands has 2 (1.3%) missing values Missing
Curb_weight has 2 (1.3%) missing values Missing
Fuel_efficiency has 3 (1.9%) missing values Missing
Power_perf_factor has 2 (1.3%) missing values Missing
Model is uniformly distributed Uniform
Latest_Launch is uniformly distributed Uniform
Sales_in_thousands has unique values Unique

Variables

Manufacturer
Categorical

Distinct count30
Unique (%)19.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Ford
 
11
Dodge
 
11
Mercedes-B
 
9
Toyota
 
9
Chevrolet
 
9
Other values (25)
108
ValueCountFrequency (%) 
Ford 11 7.0%
 
Dodge 11 7.0%
 
Mercedes-B 9 5.7%
 
Toyota 9 5.7%
 
Chevrolet 9 5.7%
 
Mitsubishi 7 4.5%
 
Chrysler 7 4.5%
 
Nissan 7 4.5%
 
Volkswagen 6 3.8%
 
Pontiac 6 3.8%
 
Other values (20) 75 47.8%
 

Length

Max length10
Mean length6.707006369
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 23 56.1%
 
Uppercase_Letter 17 41.5%
 
Dash_Punctuation 1 2.4%
 
ValueCountFrequency (%) 
Latin 40 97.6%
 
Common 1 2.4%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

Model
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count156
Unique (%)99.4%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Neon
 
2
Ram Van
 
1
RAV4
 
1
Xterra
 
1
5-Sep
 
1
Other values (151)
151
ValueCountFrequency (%) 
Neon 2 1.3%
 
Ram Van 1 0.6%
 
RAV4 1 0.6%
 
Xterra 1 0.6%
 
5-Sep 1 0.6%
 
Sonata 1 0.6%
 
Breeze 1 0.6%
 
Ram Pickup 1 0.6%
 
Cutlass 1 0.6%
 
Park Avenue 1 0.6%
 
Other values (146) 146 93.0%
 

Length

Max length14
Mean length6.554140127
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 25 41.7%
 
Uppercase_Letter 23 38.3%
 
Decimal_Number 8 13.3%
 
Other_Punctuation 2 3.3%
 
Space_Separator 1 1.7%
 
Dash_Punctuation 1 1.7%
 
ValueCountFrequency (%) 
Latin 48 80.0%
 
Common 12 20.0%
 
ValueCountFrequency (%) 
ASCII 60 100.0%
 

Sales_in_thousands
Real number (ℝ≥0)

UNIQUE
Distinct count157
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.99807643312102
Minimum0.11
Maximum540.561
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum0.11
5-th percentile1.8708
Q114.114
median29.45
Q367.956
95-th percentile185.3362
Maximum540.561
Range540.451
Interquartile range (IQR)53.842

Descriptive statistics

Standard deviation68.029422
Coefficient of variation (CV)1.283620587
Kurtosis17.55734423
Mean52.99807643
Median Absolute Deviation (MAD)20.468
Skewness3.408518366
Sum8320.698
Variance4628.002257
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
63.729 1 0.6%
 
38.554 1 0.6%
 
33.269 1 0.6%
 
16.774 1 0.6%
 
8.472 1 0.6%
 
11.592 1 0.6%
 
48.911 1 0.6%
 
0.11 1 0.6%
 
54.158 1 0.6%
 
39.572 1 0.6%
 
Other values (147) 147 93.6%
 
ValueCountFrequency (%) 
0.11 1 0.6%
 
0.916 1 0.6%
 
0.954 1 0.6%
 
1.112 1 0.6%
 
1.28 1 0.6%
 
ValueCountFrequency (%) 
540.561 1 0.6%
 
276.747 1 0.6%
 
247.994 1 0.6%
 
245.815 1 0.6%
 
230.902 1 0.6%
 

four_year_resale_value
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count117
Unique (%)96.7%
Missing36
Missing (%)22.9%
Infinite0
Infinite (%)0.0%
Mean18.07297520661157
Minimum5.16
Maximum67.55
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum5.16
5-th percentile7.85
Q111.26
median14.18
Q319.875
95-th percentile41.25
Maximum67.55
Range62.39
Interquartile range (IQR)8.615

Descriptive statistics

Standard deviation11.4533841
Coefficient of variation (CV)0.6337298629
Kurtosis5.763855916
Mean18.07297521
Median Absolute Deviation (MAD)3.96
Skewness2.294915493
Sum2186.83
Variance131.1800073
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12.025 2 1.3%
 
7.75 2 1.3%
 
18.225 2 1.3%
 
16.575 2 1.3%
 
13.245 1 0.6%
 
14.01 1 0.6%
 
58.6 1 0.6%
 
22.255 1 0.6%
 
12.545 1 0.6%
 
10.31 1 0.6%
 
Other values (107) 107 68.2%
 
(Missing) 36 22.9%
 
ValueCountFrequency (%) 
5.16 1 0.6%
 
5.86 1 0.6%
 
7.425 1 0.6%
 
7.75 2 1.3%
 
7.825 1 0.6%
 
ValueCountFrequency (%) 
67.55 1 0.6%
 
60.625 1 0.6%
 
58.6 1 0.6%
 
58.47 1 0.6%
 
50.375 1 0.6%
 

Vehicle_type
Categorical

Distinct count2
Unique (%)1.3%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
Passenger
116
Car
41
ValueCountFrequency (%) 
Passenger 116 73.9%
 
Car 41 26.1%
 

Length

Max length9
Mean length7.433121019
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 6 75.0%
 
Uppercase_Letter 2 25.0%
 
ValueCountFrequency (%) 
Latin 8 100.0%
 
ValueCountFrequency (%) 
ASCII 8 100.0%
 

Price_in_thousands
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count152
Unique (%)98.1%
Missing2
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean27.390754838709682
Minimum9.235
Maximum85.5
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum9.235
5-th percentile12.469
Q118.0175
median22.799
Q331.9475
95-th percentile55.835
Maximum85.5
Range76.265
Interquartile range (IQR)13.93

Descriptive statistics

Standard deviation14.35165319
Coefficient of variation (CV)0.5239597548
Kurtosis3.63041233
Mean27.39075484
Median Absolute Deviation (MAD)6.099
Skewness1.765734331
Sum4245.567
Variance205.9699493
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.89 2 1.3%
 
38.9 2 1.3%
 
12.64 2 1.3%
 
71.02 1 0.6%
 
24.495 1 0.6%
 
31.965 1 0.6%
 
26.399 1 0.6%
 
31.505 1 0.6%
 
22.51 1 0.6%
 
28.8 1 0.6%
 
Other values (142) 142 90.4%
 
(Missing) 2 1.3%
 
ValueCountFrequency (%) 
9.235 1 0.6%
 
9.699 1 0.6%
 
10.685 1 0.6%
 
11.528 1 0.6%
 
11.799 1 0.6%
 
ValueCountFrequency (%) 
85.5 1 0.6%
 
82.6 1 0.6%
 
74.97 1 0.6%
 
71.02 1 0.6%
 
69.725 1 0.6%
 

Engine_size
Real number (ℝ≥0)

Distinct count31
Unique (%)19.9%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean3.0608974358974357
Minimum1.0
Maximum8.0
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum1
5-th percentile1.8
Q12.3
median3
Q33.575
95-th percentile4.775
Maximum8
Range7
Interquartile range (IQR)1.275

Descriptive statistics

Standard deviation1.044652973
Coefficient of variation (CV)0.3412897672
Kurtosis2.344782023
Mean3.060897436
Median Absolute Deviation (MAD)0.7
Skewness1.100447343
Sum477.5
Variance1.091299835
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 17 10.8%
 
3 14 8.9%
 
2.5 11 7.0%
 
2.4 11 7.0%
 
4.6 9 5.7%
 
3.5 8 5.1%
 
3.8 8 5.1%
 
1.8 8 5.1%
 
4 7 4.5%
 
3.4 7 4.5%
 
Other values (21) 56 35.7%
 
ValueCountFrequency (%) 
1 1 0.6%
 
1.5 1 0.6%
 
1.6 1 0.6%
 
1.8 8 5.1%
 
1.9 5 3.2%
 
ValueCountFrequency (%) 
8 1 0.6%
 
5.7 2 1.3%
 
5.4 1 0.6%
 
5.2 2 1.3%
 
5 2 1.3%
 

Horsepower
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count66
Unique (%)42.3%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean185.94871794871796
Minimum55.0
Maximum450.0
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum55
5-th percentile114.5
Q1149.5
median177.5
Q3215
95-th percentile300
Maximum450
Range395
Interquartile range (IQR)65.5

Descriptive statistics

Standard deviation56.70032086
Coefficient of variation (CV)0.3049245054
Kurtosis2.406657478
Mean185.9487179
Median Absolute Deviation (MAD)32.5
Skewness1.000694992
Sum29008
Variance3214.926385
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
170 9 5.7%
 
150 9 5.7%
 
200 8 5.1%
 
210 7 4.5%
 
115 6 3.8%
 
275 5 3.2%
 
175 5 3.2%
 
185 5 3.2%
 
190 4 2.5%
 
120 4 2.5%
 
Other values (56) 94 59.9%
 
ValueCountFrequency (%) 
55 1 0.6%
 
92 1 0.6%
 
100 2 1.3%
 
106 1 0.6%
 
107 1 0.6%
 
ValueCountFrequency (%) 
450 1 0.6%
 
345 1 0.6%
 
310 1 0.6%
 
302 2 1.3%
 
300 4 2.5%
 

Wheelbase
Real number (ℝ≥0)

Distinct count88
Unique (%)56.4%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean107.48717948717949
Minimum92.6
Maximum138.7
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum92.6
5-th percentile95.875
Q1103
median107
Q3112.2
95-th percentile119.25
Maximum138.7
Range46.1
Interquartile range (IQR)9.2

Descriptive statistics

Standard deviation7.64130303
Coefficient of variation (CV)0.07109036693
Kurtosis2.859284871
Mean107.4871795
Median Absolute Deviation (MAD)4.6
Skewness0.9699356566
Sum16768
Variance58.38951199
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
112.2 8 5.1%
 
113 5 3.2%
 
107 5 3.2%
 
109 4 2.5%
 
98.9 4 2.5%
 
108 4 2.5%
 
102.4 4 2.5%
 
107.3 4 2.5%
 
106.5 4 2.5%
 
106.4 4 2.5%
 
Other values (78) 110 70.1%
 
ValueCountFrequency (%) 
92.6 2 1.3%
 
93.1 1 0.6%
 
93.4 1 0.6%
 
94.5 2 1.3%
 
94.9 1 0.6%
 
ValueCountFrequency (%) 
138.7 1 0.6%
 
138.5 1 0.6%
 
131 1 0.6%
 
127.2 1 0.6%
 
121.5 1 0.6%
 

Width
Real number (ℝ≥0)

Distinct count78
Unique (%)50.0%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean71.14999999999999
Minimum62.6
Maximum79.9
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum62.6
5-th percentile66.5
Q168.4
median70.55
Q373.425
95-th percentile78.2
Maximum79.9
Range17.3
Interquartile range (IQR)5.025

Descriptive statistics

Standard deviation3.451871862
Coefficient of variation (CV)0.0485154162
Kurtosis-0.3004675291
Mean71.15
Median Absolute Deviation (MAD)2.4
Skewness0.4838620694
Sum11099.4
Variance11.91541935
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
66.7 6 3.8%
 
74.4 6 3.8%
 
68.3 5 3.2%
 
70.3 5 3.2%
 
72.7 5 3.2%
 
69.1 4 2.5%
 
66.5 4 2.5%
 
69.4 4 2.5%
 
71.7 3 1.9%
 
73.6 3 1.9%
 
Other values (68) 111 70.7%
 
ValueCountFrequency (%) 
62.6 1 0.6%
 
65.7 1 0.6%
 
66.4 3 1.9%
 
66.5 4 2.5%
 
66.7 6 3.8%
 
ValueCountFrequency (%) 
79.9 1 0.6%
 
79.3 1 0.6%
 
79.1 1 0.6%
 
78.8 2 1.3%
 
78.7 1 0.6%
 

Length
Real number (ℝ≥0)

Distinct count127
Unique (%)81.4%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean187.34358974358975
Minimum149.4
Maximum224.5
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum149.4
5-th percentile163.675
Q1177.575
median187.9
Q3196.125
95-th percentile208.5
Maximum224.5
Range75.1
Interquartile range (IQR)18.55

Descriptive statistics

Standard deviation13.43175428
Coefficient of variation (CV)0.07169583065
Kurtosis0.3025740191
Mean187.3435897
Median Absolute Deviation (MAD)9.4
Skewness-0.05904682307
Sum29225.6
Variance180.4120232
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
186.3 4 2.5%
 
192 3 1.9%
 
189.2 3 1.9%
 
190.4 3 1.9%
 
176.9 2 1.3%
 
201.2 2 1.3%
 
194.8 2 1.3%
 
176.6 2 1.3%
 
163.3 2 1.3%
 
174.5 2 1.3%
 
Other values (117) 131 83.4%
 
ValueCountFrequency (%) 
149.4 1 0.6%
 
152 1 0.6%
 
157.3 1 0.6%
 
157.9 1 0.6%
 
160.4 1 0.6%
 
ValueCountFrequency (%) 
224.5 1 0.6%
 
224.2 1 0.6%
 
215.3 1 0.6%
 
215 1 0.6%
 
212 2 1.3%
 

Curb_weight
Real number (ℝ≥0)

MISSING
Distinct count147
Unique (%)94.8%
Missing2
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean3.378025806451613
Minimum1.895
Maximum5.572
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum1.895
5-th percentile2.4235
Q12.971
median3.342
Q33.7995
95-th percentile4.3891
Maximum5.572
Range3.677
Interquartile range (IQR)0.8285

Descriptive statistics

Standard deviation0.6305016344
Coefficient of variation (CV)0.1866479626
Kurtosis1.265453569
Mean3.378025806
Median Absolute Deviation (MAD)0.41
Skewness0.7081582404
Sum523.594
Variance0.397532311
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.998 3 1.9%
 
2.769 3 1.9%
 
3.876 2 1.3%
 
2.91 2 1.3%
 
3.075 2 1.3%
 
3.368 2 1.3%
 
2.567 1 0.6%
 
4.47 1 0.6%
 
2.398 1 0.6%
 
2.425 1 0.6%
 
Other values (137) 137 87.3%
 
(Missing) 2 1.3%
 
ValueCountFrequency (%) 
1.895 1 0.6%
 
2.24 1 0.6%
 
2.25 1 0.6%
 
2.332 1 0.6%
 
2.339 1 0.6%
 
ValueCountFrequency (%) 
5.572 1 0.6%
 
5.401 1 0.6%
 
5.393 1 0.6%
 
5.115 1 0.6%
 
4.808 1 0.6%
 

Fuel_capacity
Real number (ℝ≥0)

Distinct count55
Unique (%)35.3%
Missing1
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean17.951923076923077
Minimum10.3
Maximum32.0
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum10.3
5-th percentile12.5
Q115.8
median17.2
Q319.575
95-th percentile25.4
Maximum32
Range21.7
Interquartile range (IQR)3.775

Descriptive statistics

Standard deviation3.887921265
Coefficient of variation (CV)0.2165740822
Kurtosis2.07281321
Mean17.95192308
Median Absolute Deviation (MAD)1.9
Skewness1.136712406
Sum2800.5
Variance15.11593176
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.5 14 8.9%
 
17 9 5.7%
 
19 8 5.1%
 
20 8 5.1%
 
16 7 4.5%
 
13.2 6 3.8%
 
15.9 6 3.8%
 
17.5 5 3.2%
 
15 5 3.2%
 
14.5 5 3.2%
 
Other values (45) 83 52.9%
 
ValueCountFrequency (%) 
10.3 1 0.6%
 
11.9 2 1.3%
 
12 1 0.6%
 
12.1 3 1.9%
 
12.5 2 1.3%
 
ValueCountFrequency (%) 
32 2 1.3%
 
30 2 1.3%
 
26 3 1.9%
 
25.4 2 1.3%
 
25.1 1 0.6%
 

Fuel_efficiency
Real number (ℝ≥0)

MISSING
Distinct count20
Unique (%)13.0%
Missing3
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean23.844155844155843
Minimum15.0
Maximum45.0
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum15
5-th percentile16.65
Q121
median24
Q326
95-th percentile31
Maximum45
Range30
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.282705562
Coefficient of variation (CV)0.1796123792
Kurtosis3.241130815
Mean23.84415584
Median Absolute Deviation (MAD)2
Skewness0.6923277567
Sum3672
Variance18.34156693
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25 23 14.6%
 
24 16 10.2%
 
27 15 9.6%
 
23 14 8.9%
 
22 14 8.9%
 
21 14 8.9%
 
26 12 7.6%
 
19 6 3.8%
 
20 5 3.2%
 
18 5 3.2%
 
Other values (10) 30 19.1%
 
ValueCountFrequency (%) 
15 5 3.2%
 
16 3 1.9%
 
17 3 1.9%
 
18 5 3.2%
 
19 6 3.8%
 
ValueCountFrequency (%) 
45 1 0.6%
 
33 4 2.5%
 
32 1 0.6%
 
31 3 1.9%
 
30 5 3.2%
 

Latest_Launch
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count130
Unique (%)82.8%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
10/5/2012
 
2
8/27/2011
 
2
8/31/2011
 
2
11/14/2011
 
2
1/4/2012
 
2
Other values (125)
147
ValueCountFrequency (%) 
10/5/2012 2 1.3%
 
8/27/2011 2 1.3%
 
8/31/2011 2 1.3%
 
11/14/2011 2 1.3%
 
1/4/2012 2 1.3%
 
4/1/2011 2 1.3%
 
9/25/2011 2 1.3%
 
11/24/2012 2 1.3%
 
6/27/2012 2 1.3%
 
4/11/2011 2 1.3%
 
Other values (120) 137 87.3%
 

Length

Max length10
Mean length8.968152866
Min length8
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

Power_perf_factor
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count154
Unique (%)99.4%
Missing2
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean77.0435912007742
Minimum23.27627233
Maximum188.14432299999999
Zeros0
Zeros (%)0.0%
Memory size1.4 KiB

Quantile statistics

Minimum23.27627233
5-th percentile46.2039974
Q160.40770678
median72.03091719
Q389.41487752
95-th percentile125.0915129
Maximum188.144323
Range164.8680507
Interquartile range (IQR)29.00717075

Descriptive statistics

Standard deviation25.1426641
Coefficient of variation (CV)0.3263433558
Kurtosis2.081292892
Mean77.0435912
Median Absolute Deviation (MAD)14.24160572
Skewness1.070634989
Sum11941.75664
Variance632.153558
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
52.08489875 2 1.3%
 
76.50918456 1 0.6%
 
90.21170005 1 0.6%
 
70.07832154 1 0.6%
 
62.44196235 1 0.6%
 
54.26954829 1 0.6%
 
106.9844563 1 0.6%
 
70.38973726 1 0.6%
 
80.65769646 1 0.6%
 
81.49272616 1 0.6%
 
Other values (144) 144 91.7%
 
(Missing) 2 1.3%
 
ValueCountFrequency (%) 
23.27627233 1 0.6%
 
36.67228358 1 0.6%
 
39.98642475 1 0.6%
 
40.70007242 1 0.6%
 
42.87909734 1 0.6%
 
ValueCountFrequency (%) 
188.144323 1 0.6%
 
141.14115 1 0.6%
 
141.1009845 1 0.6%
 
139.9822936 1 0.6%
 
135.9147096 1 0.6%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

ManufacturerModelSales_in_thousandsfour_year_resale_valueVehicle_typePrice_in_thousandsEngine_sizeHorsepowerWheelbaseWidthLengthCurb_weightFuel_capacityFuel_efficiencyLatest_LaunchPower_perf_factor
0AcuraIntegra16.91916.360Passenger21.501.8140.0101.267.3172.42.63913.228.02/2/201258.280150
1AcuraTL39.38419.875Passenger28.403.2225.0108.170.3192.93.51717.225.06/3/201191.370778
2AcuraCL14.11418.225PassengerNaN3.2225.0106.970.6192.03.47017.226.01/4/2012NaN
3AcuraRL8.58829.725Passenger42.003.5210.0114.671.4196.63.85018.022.03/10/201191.389779
4AudiA420.39722.255Passenger23.991.8150.0102.668.2178.02.99816.427.010/8/201162.777639
5AudiA618.78023.555Passenger33.952.8200.0108.776.1192.03.56118.522.08/9/201184.565105
6AudiA81.38039.000Passenger62.004.2310.0113.074.0198.23.90223.721.02/27/2012134.656858
7BMW323i19.747NaNPassenger26.992.5170.0107.368.4176.03.17916.626.06/28/201171.191207
8BMW328i9.23128.675Passenger33.402.8193.0107.368.5176.03.19716.624.01/29/201281.877069
9BMW528i17.52736.125Passenger38.902.8193.0111.470.9188.03.47218.525.04/4/201183.998724

Last rows

ManufacturerModelSales_in_thousandsfour_year_resale_valueVehicle_typePrice_in_thousandsEngine_sizeHorsepowerWheelbaseWidthLengthCurb_weightFuel_capacityFuel_efficiencyLatest_LaunchPower_perf_factor
147VolkswagenPassat51.10216.725Passenger21.201.8150.0106.468.5184.13.04316.427.010/30/201261.701381
148VolkswagenCabrio9.56916.575Passenger19.992.0115.097.466.7160.43.07913.726.05/31/201148.907372
149VolkswagenGTI5.59613.760Passenger17.502.0115.098.968.3163.32.76214.626.04/1/201147.946841
150VolkswagenBeetle49.463NaNPassenger15.902.0115.098.967.9161.12.76914.526.010/20/201147.329632
151VolvoS4016.957NaNPassenger23.401.9160.0100.567.6176.62.99815.825.02/18/201166.113057
152VolvoV403.545NaNPassenger24.401.9160.0100.567.6176.63.04215.825.09/21/201166.498812
153VolvoS7015.245NaNPassenger27.502.4168.0104.969.3185.93.20817.925.011/24/201270.654495
154VolvoV7017.531NaNPassenger28.802.4168.0104.969.3186.23.25917.925.06/25/201171.155978
155VolvoC703.493NaNPassenger45.502.3236.0104.971.5185.73.60118.523.04/26/2011101.623357
156VolvoS8018.969NaNPassenger36.002.9201.0109.972.1189.83.60021.124.011/14/201185.735655